can learn. There's hype in there. So I will do a little bit of explaining about what machine learning is if you're not familiar with that. Hopefully, I'll explain what it is and what it isn't. In production, that is slightly a lie. This talk is about a product that was supposed to be in production at this point in time when I committed to doing this several months ago. We've done a lot of work to get it there, but take with a slight gradient of salt in that I won't actually be able to talk about the learnings of, has this thing scaled up so far? But we will be using scikit-learn. I am a data engineer at Simple. I will tell you a little bit more about what Simple is. First, a little bit more background on me. I am Jeff Kalucus. I came to technology through science. So I went to grad school at the University of Wisconsin-Madison and worked on the Large Hadron Collider, which is really fun. This is me in the cavern of the CMS experiment in Switzerland, about 100 meters underground. So why do I mention this? Since I now work at a tech company, I mention it probably because it is cool. It was a fun one. It was a pretty fun project to work on. I also mention it because when you think of machine learning, you often think of big data. And this was a project that was definitely big data. This thing is running. There are collisions happening every 25 nanoseconds. Each one is a megabyte. It ends up being a huge amount of data. There's all this cool trigger system to get down to a manageable size. And you still end up with petabytes of data. So like petabytes, you have 1,000 gigabytes, a terabyte. 1,000 terabytes is a petabyte. And you have hundreds of petabytes of data sitting on a disk somewhere. And they're doing distributed stuff to process it. So I got to do that in grad school. Never once did they do any machine learning in grad school. So huge data sets, no machine learning. Why? Some people did use machine learning in that realm. But it's kind of something you only resort to if you have to. Because in research, you want your results to be interpretable. You want to be able to explain what's going on. So I mean, I'm kind of proud of that, that we really did, I feel like, do a good job of relying on. If you can make the plot and say, I cut out everything below this line on this plot, that is understandable. And those are the kinds of papers that we were writing. So lots of data, but making these very interpretable things. On top of it, with machine learning, you're kind of handing data over to a black box. And then you say, what did it do in that black box? And you can talk about algorithms, and you can talk about linear algebra. But there's a lot more context you need to know. And in the end, there are a lot of details that are harder to reproduce. So these are reasons why you might not want to use machine learning for a problem. And what I'm talking about today is going to be machine learning on a very small data set, which is a couple of gigabytes. This was big data, no machine learning. Today, we're talking about machine learning without big data. And there are good reasons to do that or not to do that. And hopefully, you'll leave this with a little bit better understanding of why you might want to or not use machine learning for your problem. So a bit of an overview of the story today. Talking more about the problem that we're solving. In our case, it's about classifying text chats. We'll talk about that question of why did we use machine learning for this problem? Why might we not want to use it for this problem? We will walk through steps of developing the model. So you'll get to actually see a bunch of Python code showing you what it looks like to direct with Stack at Learn. And it turns out that humans love live demos and seeing things actually happen. So you humans will be able to experience that rather than just seeing slides and believing me. So then we'll talk through some complications of this. So you've built this kind of toy model. You've proved that it can work. And then you actually want to run this thing in production and do useful things with it. And what gets complicated about that? How do you manage that whole lifecycle of creating a model and using it? Then lastly, we'll talk about if you think that what we did in here is garbage, how else might you decide to do this if you want to do machine learning? And there are lots of different reasons why the shape of your organization and your team and your problem are likely not going to be exactly what is shown here. So other approaches might work for you. So what is the problem that we're solving? A little bit of context. This is simple. It's the whole idea of banking remade with lovely design, equally lovely tools to help you stay right inside your account and genuine human goodness. I'm going to actually be focusing on this genuine human goodness part, which largely talks about our customer relations. A big piece of what makes it simple, different from some on the bank, is we have a fantastic customer relations department, which all works at our headquarters in Portland, Oregon. And about half of the company is customer relations. And they're great people. And the problem we're talking about today is about making our lives easier. So I work for Simple. I'm remote. Actually, I live in the Columbus area. I'm the only one in Ohio. So I was actually in Portland last week. And I'm glad to be back. Categorizing chats. So our customer relations department, the primary way that they end up interacting with customers is through in the app. You can send a support message or a chat. And then a customer relations agent gets to respond to that and be pleasant, and oftentimes sends you a gift in the response. That data all makes it into our data warehouse. So what you're seeing right here is a fictionalized query, somewhat, to our data warehouse. Amazon Rediff is what we use there. It's a database. You can run a query against it. And you can look at a bunch of this chat data. So chats have a subject and a body when you initially submit them. And after you submit a chat, there is somebody in customer relations who's assigned to looking at the queue and triaging chats, assigning them categories so that they can go to different people who are specialized in different things. So one of the categories you see here is urgent. If you lose the card, it's something that's exposing you to fraud. It's exposing us to fraud. We want to deal with that as fast as we possibly can. So that's part of the reason of triage, is that hopefully we can find these really important things quickly and then make sure that somebody is assigned to it. So these are the categories that you might assign. Urgent, customer education, a new product, incidents, other things. So this is a great situation for machine learning in that we have this data warehouse. We have this data set with hundreds of thousands of chat messages that we can go take a look at. And we can have a machine come look at this stuff. And basically what we've got here, subject and body, these are kind of the question. And then category is the answer that we want to get out. So the problem that we want to do is we want to do the categorization thing automatically. We want a machine to handle this so that a human doesn't have to. And we already have this large corpus, this large set of data where we have a recorded answer. A human has already tagged these things. And we can use this as a training data set for figuring out if some approach that we have works or does not work. So luckily, we don't have to spin up Mechanical Turk and give our customer information to a bunch of random people that go on Mechanical Turk. Just to be a little bit more explicit about what this looks like, this is in our web app. This is a case where me, Jeff, actually lost my card in St. Louis. And I sent a message. And this is basically what we're talking about here. So you might have this whole long chat with an agent. But it's just this initial contact where I say, hey, I lost my card. And the subject on this was lost card. We want to categorize this thing so we can figure out who to send it to to get the chat started. These are some of the lovely custom relations people that we have. They are smart, empathetic people. And we don't want them to have to spend their time sitting and just reading through chats to categorize them. We want them to be using their empathy to actually respond to issues. So how do we approach this problem? Obviously, we want to use machine learning because it's got sparkles and hearts and big companies use it. It makes me feel warm and cuddly and smart. Those are bad reasons to use machine learning. Like I said before, you do sacrifice a lot of interpretability. So if I tell you that I want to use machine learning with this problem, your first question to me might be, couldn't we do something else? And you can imagine things we could do. We could write some simple rules of like, we just have an if the subject contains lost card, maybe this is something we want to treat urgently. And you could imagine totally solving this and just like having kind of a list of rules that some human is coming up with. And that would be a great first pass. See if that works for you. If that works, then please do that instead of doing machine learning. There are a couple of reasons why you might want to do this. If you try to do that, but there's still a whole lot of cases that you aren't catching, there's a good chance that machine learning is going to allow you to get farther in terms of optimizing the accuracy and scope of that. It can get very hairy to maintain a rule set like that. In our case, we have a data science department that wants to own that, like figuring out how do we categorize chats. And we have engineering that is actually taking those rules and putting them in direction. And we've had a couple of cases where we've tried to do some simple rules. And it ended up being very difficult to keep that communication going. Do the data scientists go in and change the service code? And machine learning, one of the nice things is that you can create a model artifact that then is something that's very easy for engineers to say, then take that artifact and put it in. And it's kind of this thing that data scientists can create something that's plugged into another service. In our case, we get to use a bunch of natural language processing techniques, which this is a whole field of research. And it's something that Sec It Learn, for example, has great support for. So one reason going down this path was nice is that you get to piggyback on a lot of these things that are already existing for the kind of problem that we want to look at. And management wanted us to use machine learning. And that's something. There is ability in that in that our data science organization has greater ambitions as to the ways that they want to do this. And we want to take a small, understandable question to really prove what this whole workflow looks like. So implementing this is hopefully the first step towards having infrastructure to be able to tackle other problems with machine learning in the future and do exciting new things. So if you are in my shoes and you are an engineer who has never implemented a machine learning problem, and you're talking to data scientists, and they're like, we want to do this in Python, we think, and we're using some Scikit-learn, what would you do? You would go to Scikit-learn documentation, and you would try to figure out what on earth is going on. So I'm going to kind of walk through my journey of understanding stuff. So Scikit-learn's documentation, it's actually pretty great. There's some good stuff in there. I recommend you check it out. So if you go to the front page, you go up to the documentation tab, you'll get to the music guide. And it's actually got this really fantastic first paragraph section about what even is machine learning. So I'm just going to give you a little bit of a highlight here. It says, a machine learning problem consists of a set of n samples of data, and then tries to, sorry, it considers a set of n samples of data, and then tries to predict properties of unknown data. If each sample is a singular number, and for instance, a multi-dimensional entry, it is said to have several attributes or features. So in our case, we have a data set with two features. There's a body of the message, and there's a subject on the message. So we have two features to feed in, and then we want to get some answer out of a classification. There are generally two types of machine learning problems. There's supervised learning, and there's unsupervised learning. Unsupervised learning is in the case where you don't have the answers already. So if you didn't have them already classified, this is things like clustering. So you can imagine taking all of those chats and saying, hey, algorithm, try to figure out some things that look similar to each other. Maybe that would be useful for even getting a sense of what categories I want to have. And you could take a look at those clusters. What's in common with those? Oh, it looks like lost card things are a category that I might want to consider. That's not what we're doing. We're doing supervised learning, and in particular, we're doing classification. So we already have a data set where we have inputs, and what should the classification be? What we're going to learn based on that data set, and then we are going to spit out a model that's able to take in new inputs and predict a classification for it. OK, now we get to the fun stuff, the code. This first slide kind of lays out the whole thing. So be prepared. We're jumping into the second line. In particular, we are creating a pipeline. So this is one of my favorite things about scikit-learn. It has this whole API with lots of bits and pieces, but it provides this nice kind of wrapper. You can define a pipeline with a whole bunch of steps of how do I want to process my data and then finally apply a classifier to it. And this ends up making our job of taking all of this stuff and turning it into an artifact a lot easier on the line. So this slide serves as an introduction to scikit-learn. It serves as an introduction to network language processing. So I'm going to go through each line a little bit here. So creating a pipeline, first step in here, we're giving it a name. We're calling this step preprocess. And then we're passing in this message preprocessor. So message preprocessor and text processor, in our case, these are two classes that we have defined, but they inherit from this scikit-learn interface of a transformer mixin. So it gives you this way of you just instantiate a class that's based on this mixin, and it becomes something that you can throw into a pipeline. So the message preprocessor, all of that is doing is it's taking in some chunk of data that has a subject and a body in it, and it's turning it into a single string. And it has the subject weight in here. So the subject weight is simply include the subject that many times. So we found that the subject tends to be more important than the stuff that's in the body. So this gives us a way to kind of dial up how important should the subject be. So we just duplicate the subject a bunch of times, append that to the body, and that's the output of this preprocessor. So now we're just down to a string. Then we use this text processor. It takes in stop words and a lemmatizer. So stop words, this is a concept from natural language processing. These are things like and, as, but, just like all of those words that don't really add information, they're just kind of connecting together the sentence. So this is something you might want to play around with. We might want to include simple, simple.com, other things that are specific to our domain that really aren't adding information to the message. And then a lemmatizer. So this is another natural language processing term. A lemma is the form of a word that you would find in the dictionary. So if you think of stopping, the word that you would look up is stop. So a lemmatizer is just a chunk of code that will take in a bunch of words and change stopping into stop, change going into go, et cetera. So this is some of those precanned techniques that you can take advantage of. You can find lots of documentation on that. So this is cleaning it up. Next step is account vectorizer and this term frequency transformer. These are just progressively going away from, we had text at this point, and now we're just kind of creating this mathematical glob that is easier for an algorithm to understand. So I'm not going to go into what those things are specifically. And then finally, now that we have this nice mathematical form of the data, we've turned it into this new count of words and it's transformed. Then we throw it into this classifier. If a gradient-boosted classifier, don't ask me about the specifics of gradient-boosted classifiers. One of the data scientists does this. OK. So this is the whole pipeline, all the steps that our data is going to go through. Yeah, question. I will ask, but I think it's not in the Q&A. Correct. Great question. So one of the great things about scikit-learn is that it has become this cornerstone of if you want to do machine learning in Python, but it doesn't have to stand on its own. And a lot of other packages provide wrappers that make it easy to plug things into scikit-learn. So yes, yeah, xgboost is a separate package, but it provides a nice wrapper. And you can just throw it into a scikit-learn pipeline like this. And yeah, feel free to ask questions as we go through. Any other questions? Does it like to go into its own classifiers? It does come with its own classifiers. How do you pass in stuff from a pipeline object? It won't pass, but it won't pass into a pipeline. So yeah, this pipeline object gets, this is literally a list that we're passing in here to the pipeline object. So it holds all this stuff. And then you'll see in the next slide how we pass data into this whole thing. One last question here. Quick question about scikit-learn. Yep. Is it based in part on the natural language toolkit, or does it have its own implementation of that kind of functionality? Is natural language toolkit a specific library? Oh, NLTK is that what it is? We, I have no idea the overlap there. I think that we are using NLTK in some of the stuff that we're doing here. I can't remember the specifics of whether, what's built in and what's not. All right. So we have a pipeline. Let's actually do something with it. So the first thing that we have to do is we have to train the model with existing data. So this is going to end a little bit up if you've never heard of pandas. It's part of the Python ecosystem. Quick introduction to Python ecosystem. There's NumPy, which is a library that's just efficient arrays and stuff, kind of like basic building blocks. There's SciPy, which is a bunch of algorithms that are useful. And pandas is kind of like data science glue stuff that makes things convenient. So pandas has this read SQL function. You just pass it in a database connection and a query, and it spits out something that's called a pandas data frame. And it's basically a big matrix of information with a bunch of convenience functions. So we're pulling all this data out of our data warehouse, the category subject and body. And then we're breaking it up into y is the dependent variable. This is the thing that we want to figure out, which is the category. And then x is the independent variable, which is the subject and the body. Anything about this line rub you wrong? Anything that seems non-idiomatic about this line? Somebody want to call it out? This is the capital X over here. Why on earth is this a capital X? This should be a lowercase x because it's a variable in Python. It's a capital X. This is a case of math idiom trumpeting Python idiom. So in linear algebra, in math in general, this is a matrix. This is a single, this is a one dimensional thing. This is a multi-dimensional thing. So that's why you'll see these x's be capitalized. So we break it up. And then we break it up even more. What on earth is going on here? Train, test, split. We are essentially saying we want to reserve one third of the data for testing later. And we're only going to trade this model on two thirds of the data. Why would you want to do that? The more data that you have to train, the more accurate your model can potentially be. This I will explain in a little bit why you want to reserve this. But that's what we're doing. We're reserving one third of the information for testing later. Finally, we call pipeline.fit. And we pass in all of this information, this training data set in there. So we're telling it that x stuff, those independent variables, and we're also telling it these are the answers that we've already created. So once you've done that, you want to go back and you want to use this test data to validate how is it performing. Why do you need to validate? Because there's something called overfitting. So imagine that these black points are some sort of data that you're interested in. You could fit that with this black line here, a nice straight line. You'll notice that doesn't pass through the points. So there's some error there. It's not perfectly modeling the data. But generally, the black dots tend to be in the lower left corner and they tend to be in the upper right corner. This black line is telling you something about the underlying structure. This blue line perfectly fits the data. It hits every point. But if I take some more observations, do you think they are going to fall perfectly on that blue line? Probably not. This blue line tells you absolutely nothing about the underlying structure of the data. So this is overfitting. This blue line is like you have extracted too much out of your training data set. And it's gibberish. It doesn't really tell you anything about predicting future results. So that's why you want your reserve sum. So you can tell, have I overfit? Would this be able to make reasonable predictions for future inputs? So testing the model looks like this. We have our X test matrix here. We're throwing it into this predict method of the pipeline. So that spits out what the model says the answer should be for this test data. And then we're printing out this classification for it that considers what the model predicted and then what were the actual assignments that the humans made to this. It does some statistic-y things. And then it spits out, this is generally how you are performing. We aren't going to go into the specifics of this. This is bogus. And doesn't that actually reflect the data that was thrown in there? But that's the kind of thing you can get out. The score is basically telling you, if this were a one, it means you always got it. Your model's doing great. If this is low, it's giving you a sense of what maximum performance you can expect for future inputs. So we've actually trained the model. We've put some information through the pipeline. So we have a model that we can use. And now, in order to put that into some context where it can be used in the future, we want to actually wrap that in a Flask service. So this is what it looks like to actually be able to spit out classifications in an API using Flask. So we are defining a route. So we're going to have a service that you can hit a URL that looks like this. This is a messages endpoint. You post JSON data here. You pull messages out of that JSON. Then we're calling the pipeline.predict method again. This is the same thing that we called in testing. And then we're just massaging the data out of those predictions and JSONifying it again for something that we can spit back to the user that requested it. Can you pick up the model? We'll get to that. We'll get to that. Asking about pickling and persisting model. OK, so live demo time. This is exciting because it means that I can mess up all sorts of things. So over on the right here, we're going to actually start up this Flask Gap. It's bundled up in this bin run executable that we created. So it's now running. We have a Flask Gap going. And you'll see this spit out log messages as we make requests. So let's try to hit this API. We get 405. That's because we said that this should be a post. So we just made a get request. It was 405 because we didn't tell it what to do with the get request. So let's make this a post instead. So dash x post. I'm using curl, which is the command line thing for making HTTP requests. We've got 400 this time. Getting a little bit better. We're making a post request now. But we aren't actually posting any data. So we need to actually give it what data are we trying to post. So let's actually put in a whole message here. So you can just ignore these two IDs. This is just for tracking so that when you're making a request, you can understand what is the message that I'm getting back. But we're putting in a body and a subject here. And this thing is actually running. And we're going to see what classification it makes for this message. Do you like that class label? A correct class label is urgent. Good job. So let's make another request here. Let's say instead of this, I am interested in some new feature. So let's say joint accounts. I want to ask about joint accounts. And I'm going to put in my body here. I want to share money with my bae. All right, that ended up being customer education. I think that I, yeah. If you do a slightly different thing, I want to share the ShareCAN account. Yeah, OK. I got this to say new product earlier. So you can tell here this is not perfect. There was not material difference between those two. So it identified. So the shared accounts were a big news thing that we launched in January. We didn't have joint accounts for a long time. So people were very excited about this. But that was a new product. All right, anybody want to volunteer information for one last request? And we'll see what it does. I'm curious if you, on your first one, if you change lost to found, what it does. That's a good question. Let's see, I found my card. Yeah, I'm just building the lost card thing up there. I found. Yeah, it might still. The ASAP part may mark this as urgent. So interesting. Do you have any ex-colleagues to mark this feeling? They may. I don't remember whether we have. I don't remember whether we strip out punctuation. We may strip out punctuation. But I'm looking for a volunteer to put something else into this before we move on. What kind of card protection do you offer? OK, what kind of card protection? I'll say protection, protection. What kind do you offer? What kind do you offer, kind people? Yeah, customer education. A lot of these end up getting labeled with customer education, which I think shows that we have a lot of things that tag customer education in our input. So I am running lower on time. So I am going to keep on chugging. And I'm going to go back to the slides. So great, we have a toy. All right, all right. So we have a nice toy. It doesn't work perfectly. But it's better than not having any classification at all at this point. So we put it in a web app kind of shape. How do we actually take this to production and allow our data scientists to do their thing and create models and allow this to actually get into this application? This was supposed to be, oh yeah, OK. I was supposed to say something about the training wheels coming off. Anyways, so some things that you might want to consider. I'm going to talk about three steps of stuff that we did to turn this into something more sustainable. So the first step of real life here is separating training, training the model, and actually serving the model. Again, this is kind of like data scientists, mind control, training the model. Engineers care about serving the model. The one is kind of a better oriented thing. The other thing is something we want to serve in real time in a service. So to do that, we need to persist the model somewhere and have it someplace where the service can pick it up. And somebody asked the question of, did you use Pickle to do this? Unfortunately, yes. Yes, we did. And why did we do that? Because if you look in the documentation for Scikit-Learn, it has a section about model persistence. And it says, hey, there's built-in support for Pickle. It has some issues. If you don't want to use Pickle, you're on your own. And that is pretty much. This is one of the hairiest things, actually. And this is something that I'll talk a little bit later about what some larger companies do to solve this problem. But we are using Pickle. Issues with Pickle, if you aren't familiar with. So Pickle is a built-in Python library for taking a Python object and serializing it into something you put in the file. Issues with it are, number one, that it's like it's code. And if you unpickle something, it gives a chance to just run arbitrary code. So you definitely don't. If your friend says, hey, I found this file. You want to unpickle it? Say no. Don't do it. So you want to trust where you get your Pickle from before you unpickle it. This is in life. This is in life. Whew. Well, this got exciting. So in our case, we're using Pickle. And the place that we're doing this, we're all in Amazon Web Services. So S3 is the object store in AWS. So we're creating an S3 client. KMS is Amazon's key management service. So what we're doing here is we're dumping our pipeline object to a Pickle file. We're then encrypting that Pickle file. And we're putting that object to S3. Why are we encrypting it? Again, we're using this key management thing. This is to guarantee that it came from this service that has access to the key. Like, whoever created this Pickle had access to the key. And hopefully, it wasn't some rogue engineer that has access to many things. Hopefully, it was actually the service. Yep. So that's what it looks like for us to turn this thing into a Pickle and put it in S3. And then our Flask app pulls the thing up when we start the app. That's what we decided. So we have no way of automatically doing that. If we wanted to change which model we're using, we just change the config file. And we restart the service in order to pull in that new updated model with more training data or with updated logic. And that's one of the really nice things about the Cyclone pipeline is it's packaging together all the logic of the pipeline along with the trained data. So you pull this out, you unencrypted it, and then you have it available to your app to use. Step two is providing an environment for doing this batch training and evaluation. I'm going to read through this quickly. This is the actual training of the model. Data scientists could do that on their laptop. In our case, it's not super computationally expensive. So you could just do that, create your model there, and then have the data scientist put in S3. We want some more guardrail down the net, because what if they pulled in some random version of something or whatever? So when we train the model, we want to know what version of the code is used, what are all the libraries, really be able to reproduce this model if it came to it. You also might want to do really computationally expensive things when you're training the model. So something that we could do is this grid search thing. What this whole blob is is just saying, hey, our pipeline have all these parameters we're passing in? We don't know if those are the best parameters. Let's try out a whole bunch of different options and a bunch of different combinations. And 5 times 3 times 3 times 2 times 2 times 2, this is already 360 different combinations. Let's just run this whole process to fit the model on each one of those combinations. And that's where you start being like, well, it would be nice to have an environment to just do all of that stuff and not do it on my laptop. So that's something that you might want to consider. And step 3 is when you actually have your app running, how do you know if it's doing reasonable things? So you want to do all the things that you would do with any other web service. You want to monitor the performance. You have to figure out how you adapt to your production load if there are other spikes. And when this happens, do we get pretty average consistent from the model, pretty average performance from the model, or does that vary a lot? How do you degrade gracefully if something goes terribly wrong? And other applications that are now you're relying on getting these predictions back don't get predictions back. And as I said at the beginning, I don't have a whole lot to say on this subject yet because we have not actually started relying on this yet. So we have a toy. We know that the general approach works. But we shall see how it goes when we actually get this into production reliant. So last thing I want to touch on is what are some other approaches if you don't want to put together your own batch process, figure out how you're going to do all the persistence and all of that. So considerations is how big is your team? For us, we have a team of three data engineers right now. We have a team of half a dozen data scientists. But all of us were working on this part time and took us three months to build what we built. A lot of companies that really care about machine learning have a whole team of machine learning engineers and stuff. And this starts to look very different in that case. If you're really big and you have lots of engineering resources to throw at it and you care a whole lot about the performance of the model, what a lot of people will do is sometimes actually train the model in scikit-learn or in Python. Data scientists are familiar with that. They can create some artifact. But you choose some custom serialization format like you dump it out to some JSON with the parameters. And then you might have an application written in Java or Scala or something else to actually run the model and get better performance that way. And you've got to define your own interface to get from one language to the other. And that takes a whole lot more effort than what we threw at this. So how large of a problem space do you need to cover? If you do that sort of thing that I just said of like defining your own custom serialization format between one and the other, you have to do that for every different kind of model that you want. So this is something where right now we don't know what all kinds of models we might want to use. So we wanted to focus on, let's try to do something that we could adapt to, like other stuff in scikit-learn. And we might want to spin off some completely different model. So it's kind of nice that right now we could do something that is significantly different from this natural language processing problem and throw it in there. So those are some of the considerations you might think about. There are also companies that would want to take your money. I don't know how legit any of these are. These are some of the ones that I found. Y hat has a science ops platform. There's Anaconda Enterprise. There's Domino. They all have these nice shiny marketing materials. And they claim that they're going to solve your communication issues. And everybody is going to be able to live in their happy land. And there's some variation of like, we'll provide infrastructure for you to run this stuff. Your data scientists can click on buttons and decide what version of a model is being promoted. So it solves a bunch of those problems. We didn't go too far down the path of understanding that. All right, I am going to finish up here. If you want to train test in batch environments, even if you want to serve your results in real time, you have this nice concept of serializing something and having this artifact of your trained model that you can then use in a different context. And socket-learners pipeline module is really helpful for making that happen. And then actually serving the model in a real time context. Once you figure the rest of this out, this looks a whole lot like just serving any other application. And likely the details of your own environment apply to that. So thank you. And we have approximately eight minutes for questions. I am on Twitter at Jeff Lucas. I have already posted these slides. So you can see them on Twitter. I will try to remember to post them in Slack or if somebody would like to go ahead and post that link in Slack. That would be good. What questions do you have? We have a question here. Once it is in production, do you have any way of seeing from the results of learning from the results of the thing? So that is, say, 95% of the queries are classified as customer education. You can think that you might be able to detect that there might be an issue with your training materials and so forth. Yeah, our plan is to, first of all, we're going to be logging all these results to a database. OK, yeah, I will repeat the question. So the question was, how will we know, as we're using this in production and we get answers out of it, how are we going to know how well we're doing? And if it's way overclassifying things to one thing, how are we going to react to that? And we are planning to persist all of these classifications that happen. Do a database so it'll be kind of a flexible data set that we can look at. Also, we're expecting if we classify things wrong, humans will still end up reclassifying them. So that'll be a very interesting thing to look at, is how many of these things get reclassified. And then it's to be seen how much of that we feed back into the model and what exactly parameters we use to filter out what we drain into new versions of the model. So as you're prototyping and running it, does it take a lot of resources at that stage? So does this take a lot of resources while you're prototyping it? That probably depends on what you're doing. But in our case, it's the query to the database that's the extensive thing. So I pull that in, and then I can pretty quickly training the model. In this case, it takes a second or two. But again, that very much depends on what algorithm you're using. They're all expensive classifiers and processing that you can do on data. So you mentioned that engineers need to end up serving you up. Have you found good efficiencies or good optimizations? Because you just thought with a class gap, more likely different engineering groups are going to have their own different shipping codes and stuff. Have you found good optimizations for that? Or are they just a problem? So the question is, what optimizations have we found? We're running a class gap. How does that perform? And we don't know at this point. We expect it to be a pretty low volume data set at this point. So we're not expecting to see performance issues. But it will be very interesting to see how we're able to handle it. Anyway, we tried putting a pickle file in the repo of the API for GitHub. So if you're using Docker or Kubernetes or something, and you're spinning that image on every Google Quest master, you have your pickle file in the repo. It creates the image. And then you use Kubernetes, and you have all your pods. And you basically are scaling up to the sense that your users are going to get different pods of the model as of morning, which kind of a hand wave. I'm just curious about other implementations of that. Yeah, there's not much we've learned yet in that area. And in your example, how many messages were in your training set? How many messages were in our training set? I should have an answer for that. Something on the order of 100,000, I think. Over here. If you're a data scientist, what if you're using a different tool like R or something, would that approach be something you could modify that would accommodate that? So yeah, if our data scientists were using something like R, would we be able to modify this approach? We kind of decided to go all in on Python. Most of our web services we create are in Scala. Spinning up a Flask Gap in Python is weird for us. So that's somewhat uncharted territory. But we decided that's worth doing because it unlocks the whole Python data ecosystem. And we can just train the model in Python and then serve it in Python. And that gives us a lot of flexibility for right now. So the search answer is we would also have to figure out how to serve an R model if we wanted to do that. And so that is not something we're planning to do. Yep. I was wondering if you had any difficulties building a batch of training in Python and also the environment that you could actually get to the Python and just use the Python environment. So OK. We have trouble building a batch training environment that we can actually get buy-in from. So we already happened to have basically a scheduled task service running that happens to be in Python. It's using Celery to basically be a really dumb and expensive cron solution. So that was something that was already existing. We're like, this is Python. Can throw Python code in here. So that's what we're using. And it was convenient for our particular situation. And our data scientists were already used to creating jobs in there. I heard the term re-enforcing training. When you do a lot of production, you can try to do routine reclassifying. Does that get you the realm of re-enforcing training? So yeah. The question is, when we have humans going in and reclassifying things, does that get into the realm of reinforcement learning if we're throwing that back into the model? I am not super familiar with the term reinforcement learning. So I am not going to make a comment on that, because I don't want to spread misinformation. But likely, likely yes. So do you have an architected model to help you think of things like, if you think, please help me process versions or use some obvious rules? Have you seen examples of where you want to use machine learning to go past those obvious rules and find the exceptions? Like someone said, please help me learn about this new product. Have you seen examples of when it manages to find those exceptions? Yeah. So the question is about, have you seen examples of where you have a concept of, please help, and is it able to find variants of that that a human wasn't anticipating? Not off the top of my head, but if you play around with this, yes, that idea of stemming words, like lemmatizing, it works pretty well. So yeah, it's something worth playing around with. All right. And we'll take one more question. I know you mentioned this in the beginning of the talk, so I apologize if you have to reiterate. What is the goal, I guess, of all this? So you already have human beings classifying information. What are you hoping to get out of having a machine do that? Yeah, so we already have a process for humans classifying these chats. What's the benefit of having a machine do this? So it's two things. First of all, it's getting that workload off of a human. So it's a human that can go do something else. I'm sorry. The workload, though, is just picking it from a dropdown button, right? Yeah, so yeah, the workload is reading through the message, understanding what it is, picking some category from a dropdown, and going through that. So it removes a human having to be dedicated to that task and being able to do what they are more talented at doing. It also reduces the amount of time to get to that classification. So even if you have a driver system set up, maybe somebody is going in once an hour and you're working down the list of messages. And as soon as the message comes in, within seconds, it can have that first step of classification. So then your people that are assigned to working urgent tickets, those are immediately available in that queue. So we're hoping that it's going to bring down the mean time to resolution on those urgent issues. Thank you, everybody. Happy day.